Improving the performance of Google file system to support Big Data
نویسندگان
چکیده
After doing research on Google File System, we find out some methods to improve the performance of Google file system. Google File System is a scalable distributed file system for large size distributed data-intensive applications. It provides high fault tolerance while running on inexpensive commodity hardware and it delivers high aggregate performance to a large number of clients. But there are some limitations in it such as it uses same chunk size to append and write data. Fixed chunk size decreases its performance for append data. So we will explain some methods to increase its performance by changing some attributes of typical Google File System. This paper is divided into five parts. First part presents the basic introduction of Google File System, second part provides the performance of GFS cluster for a 64 MB chunk size, third part shows the performance of real time GFS clusters, fourth part presents a method to increase the performance of GFS, and finally part fifth concludes the effect of variable size chunk on GFS.
منابع مشابه
A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملImplementation of Random Forest Algorithm in Order to Use Big Data to Improve Real-Time Traffic Monitoring and Safety
Nowadays the active traffic management is enabled for better performance due to the nature of the real-time large data in transportation system. With the advancement of large data, monitoring and improving the traffic safety transformed into necessity in the form of actively and appropriately. Per-formance efficiency and traffic safety are considered as an im-portant element in measuring the pe...
متن کاملCorrelation of Big Data with Supply Chain Health Performance in Employees of the Tehran Intelligent Fuel System
Introduction: The dramatic growth of big data and its application in preventing waste of resources and increasing financial performance and supply chain health levels, need to be examined from different perspectives. This study aimed to determine the correlation between big data and supply chain health performance in employees of Tehran Intelligent Fuel System. Methods: In this descriptive cor...
متن کاملGoogle File System and Hadoop Distributed File System - An Analogy
Big Data has indeed been the word which IT Industry is talking about lately. With advancement of automation and data being processed in real time, it has now become a necessity for companies to look forward to sustainable solutions to store their huge datasets and compute valuable information out of it. High performance computing heavily relies on distributed environments to process large chunk...
متن کاملImproving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کامل